51 research outputs found
BioSimulator.jl: Stochastic simulation in Julia
Biological systems with intertwined feedback loops pose a challenge to
mathematical modeling efforts. Moreover, rare events, such as mutation and
extinction, complicate system dynamics. Stochastic simulation algorithms are
useful in generating time-evolution trajectories for these systems because they
can adequately capture the influence of random fluctuations and quantify rare
events. We present a simple and flexible package, BioSimulator.jl, for
implementing the Gillespie algorithm, -leaping, and related stochastic
simulation algorithms. The objective of this work is to provide scientists
across domains with fast, user-friendly simulation tools. We used the
high-performance programming language Julia because of its emphasis on
scientific computing. Our software package implements a suite of stochastic
simulation algorithms based on Markov chain theory. We provide the ability to
(a) diagram Petri Nets describing interactions, (b) plot average trajectories
and attached standard deviations of each participating species over time, and
(c) generate frequency distributions of each species at a specified time.
BioSimulator.jl's interface allows users to build models programmatically
within Julia. A model is then passed to the simulate routine to generate
simulation data. The built-in tools allow one to visualize results and compute
summary statistics. Our examples highlight the broad applicability of our
software to systems of varying complexity from ecology, systems biology,
chemistry, and genetics. The user-friendly nature of BioSimulator.jl encourages
the use of stochastic simulation, minimizes tedious programming efforts, and
reduces errors during model specification.Comment: 27 pages, 5 figures, 3 table
Recommended from our members
Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity.
BackgroundConsecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression.ResultsWe extend and efficiently implement iterative hard thresholding (IHT) for multiple regression, treating all SNPs simultaneously. Our extensions accommodate generalized linear models, prior information on genetic variants, and grouping of variants. In our simulations, IHT recovers up to 30% more true predictors than SNP-by-SNP association testing and exhibits a 2-3 orders of magnitude decrease in false-positive rates compared with lasso regression. We also test IHT on the UK Biobank hypertension phenotypes and the Northern Finland Birth Cohort of 1966 cardiovascular phenotypes. We find that IHT scales to the large datasets of contemporary human genetics and recovers the plausible genetic variants identified by previous studies.ConclusionsOur real data analysis and simulation studies suggest that IHT can (i) recover highly correlated predictors, (ii) avoid over-fitting, (iii) deliver better true-positive and false-positive rates than either marginal testing or lasso regression, (iv) recover unbiased regression coefficients, (v) exploit prior information and group-sparsity, and (vi) be used with biobank-sized datasets. Although these advances are studied for genome-wide association studies inference, our extensions are pertinent to other regression problems with large numbers of predictors
Ten Simple Rules for Getting Help from Online Scientific Communities
The increasing complexity of research requires scientists to work at the intersection of multiple fields and to face problems for which their formal education has not prepared them. For example, biologists with no or little background in programming are now often using complex scripts to handle the results from their experiments; vice versa, programmers wishing to enter the world of bioinformatics must know about biochemistry, genetics, and other fields.
In this context, communication tools such as mailing lists, web forums, and online communities acquire increasing importance. These tools permit scientists to quickly contact people skilled in a specialized field. A question posed properly to the right online scientific community can help in solving difficult problems, often faster than screening literature or writing to publication authors. The growth of active online scientific communities, such as those listed in Table S1, demonstrates how these tools are becoming an important source of support for an increasing number of researchers.
Nevertheless, making proper use of these resources is not easy. Adhering to the social norms of World Wide Web communication—loosely termed “netiquette”—is both important and non-trivial.
In this article, we take inspiration from our experience on Internet-shared scientific knowledge, and from similar documents such as “Asking the Questions the Smart Way” and “Getting Answers”, to provide guidelines and suggestions on how to use online communities to solve scientific problems
OPENMENDEL: A Cooperative Programming Project for Statistical Genetics
Statistical methods for genomewide association studies (GWAS) continue to
improve. However, the increasing volume and variety of genetic and genomic data
make computational speed and ease of data manipulation mandatory in future
software. In our view, a collaborative effort of statistical geneticists is
required to develop open source software targeted to genetic epidemiology. Our
attempt to meet this need is called the OPENMENDELproject
(https://openmendel.github.io). It aims to (1) enable interactive and
reproducible analyses with informative intermediate results, (2) scale to big
data analytics, (3) embrace parallel and distributed computing, (4) adapt to
rapid hardware evolution, (5) allow cloud computing, (6) allow integration of
varied genetic data types, and (7) foster easy communication between
clinicians, geneticists, statisticians, and computer scientists. This article
reviews and makes recommendations to the genetic epidemiology community in the
context of the OPENMENDEL project.Comment: 16 pages, 2 figures, 2 table
The dynamics of human body weight change
An imbalance between energy intake and energy expenditure will lead to a
change in body weight (mass) and body composition (fat and lean masses). A
quantitative understanding of the processes involved, which currently remains
lacking, will be useful in determining the etiology and treatment of obesity
and other conditions resulting from prolonged energy imbalance. Here, we show
that the long-term dynamics of human weight change can be captured by a
mathematical model of the macronutrient flux balances and all previous models
are special cases of this model. We show that the generic dynamical behavior of
body composition for a clamped diet can be divided into two classes. In the
first class, the body composition and mass are determined uniquely. In the
second class, the body composition can exist at an infinite number of possible
states. Surprisingly, perturbations of dietary energy intake or energy
expenditure can give identical responses in both model classes and existing
data are insufficient to distinguish between these two possibilities. However,
this distinction is important for the efficacy of clinical interventions that
alter body composition and mass
Whole-genome sequencing of pharmacogenetic drug response in racially diverse children with asthma
RATIONALE: Albuterol, a bronchodilator medication, is the first-line therapy for asthma worldwide. There are significant racial/ethnic differences in albuterol drug response.
OBJECTIVES: To identify genetic variants important for bronchodilator drug response (BDR) in racially diverse children.
METHODS: We performed the first whole-genome sequencing pharmacogenetics study from 1,441 children with asthma from the tails of the BDR distribution to identify genetic association with BDR.
MEASUREMENTS AND MAIN RESULTS: We identified population-specific and shared genetic variants associated with BDR, including genome-wide significant (P \u3c 3.53 × 10-7) and suggestive (P \u3c 7.06 × 10-6) loci near genes previously associated with lung capacity (DNAH5), immunity (NFKB1 and PLCB1), and β-adrenergic signaling (ADAMTS3 and COX18). Functional analyses of the BDR-associated SNP in NFKB1 revealed potential regulatory function in bronchial smooth muscle cells. The SNP is also an expression quantitative trait locus for a neighboring gene, SLC39A8. The lack of other asthma study populations with BDR and whole-genome sequencing data on minority children makes it impossible to perform replication of our rare variant associations. Minority underrepresentation also poses significant challenges to identify age-matched and population-matched cohorts of sufficient sample size for replication of our common variant findings.
CONCLUSIONS: The lack of minority data, despite a collaboration of eight universities and 13 individual laboratories, highlights the urgent need for a dedicated national effort to prioritize diversity in research. Our study expands the understanding of pharmacogenetic analyses in racially/ethnically diverse populations and advances the foundation for precision medicine in at-risk and understudied minority populations
Whole-Genome Sequencing of Pharmacogenetic Drug Response in Racially Diverse Children with Asthma
RATIONALE: Albuterol, a bronchodilator medication, is the first-line therapy for asthma worldwide. There are significant racial/ethnic differences in albuterol drug response.
OBJECTIVES: To identify genetic variants important for bronchodilator drug response (BDR) in racially diverse children.
METHODS: We performed the first whole-genome sequencing pharmacogenetics study from 1,441 children with asthma from the tails of the BDR distribution to identify genetic association with BDR.
MEASUREMENTS AND MAIN RESULTS: We identified population-specific and shared genetic variants associated with BDR, including genome-wide significant (P \u3c 3.53 × 10
CONCLUSIONS: The lack of minority data, despite a collaboration of eight universities and 13 individual laboratories, highlights the urgent need for a dedicated national effort to prioritize diversity in research. Our study expands the understanding of pharmacogenetic analyses in racially/ethnically diverse populations and advances the foundation for precision medicine in at-risk and understudied minority populations
Finishing the euchromatic sequence of the human genome
The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
- …